Interlab from iGEM 2015


In [1]:
%matplotlib inline
import pylab as pl
from math import sqrt
import sys

# importing platemate
sys.path.insert(0, '../src')
import platemate

Configuring and reading your data

You first need to map each column in your plate to a colony or a control.


In [2]:
ColumnNames = {
    'C' : "Dev1",
    'D' : "Dev2",
    'E' : "Dev3"
    }

controlNames = {
    'A' : "LB",
    'B' : "LB+Cam",
    'F' : "+control",
    'G' : "-control1",
    'H' : "-control2"
    }

Then, we an instance of PlateMate:


In [19]:
reload(platemate)
pm = platemate.PlateMate( colonyNames = ColumnNames, controlNames = controlNames )

The variable pm above is what you will be using to read and parse your data, plot it, and analyze it. It will consider the plate mapping you have defined by colonyNames and controlNames. For instance, you can retrieve this information by using two functions below:


In [20]:
print pm.getColonyNames()
print pm.getControlNames()


['Dev1', 'Dev3', 'Dev2']
['LB', '-control2', 'LB+Cam', '-control1', '+control']


Let's get started with importing our data. During this process, PlateMate will look for all files that follow a pattern (in our case, each file starts with "medida"). Then, it will read each of those files and parse them. This is usually a fast process, even for larger data sets, and should take only a fraction of section to complete.


In [21]:
pm.findFiles("medida")
pm.readFluorescence()
pm.readOpticalDensity()

Now our instance pm has all information about the plate readings. We can get a summary from one of the well sets by using the function summary(). As an example, let us check the data from LB wells.


In [22]:
pm.summary("LB")


Out[22]:
A01 A02 A03 A04 A05 A06 A07 A08 A09
0 896.879 924.144 899.629 796.180 936.492 826.367 759.291 792.928 946.623
1 946.583 938.596 928.959 790.242 946.989 838.949 777.353 790.383 963.707
2 694.778 774.716 718.264 533.280 735.652 606.997 629.488 703.695 852.307

Each row displayed above represents a different measure in time (in our case, each measurement were spaced by 1 hour). Thus, we're looking at the data from the first 3 hours. If you look back in our map, all wells with LB were in column A. Each number with A refers to a different row on the actual plate, i.e., A04 represents column A row 4 in the original plate.

You can always retrieve the whole data from a population by using getFluorescence().


In [23]:
pm.getFluorescence("LB")


Out[23]:
A01 A02 A03 A04 A05 A06 A07 A08 A09
0 896.879 924.144 899.629 796.180 936.492 826.367 759.291 792.928 946.623
1 946.583 938.596 928.959 790.242 946.989 838.949 777.353 790.383 963.707
2 694.778 774.716 718.264 533.280 735.652 606.997 629.488 703.695 852.307
3 737.650 821.107 748.746 606.770 802.477 799.081 726.210 740.538 888.691
4 743.555 825.443 746.563 639.481 820.822 844.555 816.009 911.706 952.603
5 799.255 882.934 779.688 688.916 839.220 888.245 860.381 905.064 1024.023
6 782.279 809.345 773.809 742.165 820.198 836.612 880.490 901.366 1014.964
7 776.692 817.415 761.893 736.503 852.735 841.259 858.177 837.875 1007.036
8 776.348 877.195 747.081 723.228 825.820 858.616 866.621 996.143 998.920

Similarly, you can check the optical density associated with that particular population.


In [24]:
pm.getOpticalDensity("LB")


Out[24]:
A01 A02 A03 A04 A05 A06 A07 A08 A09
0 0.2661 0.3356 0.2544 0.3294 0.2807 0.2994 0.2424 0.2641 0.1133
1 0.2820 0.3573 0.2772 0.3622 0.3081 0.3329 0.2674 0.2913 0.1247
2 0.2820 0.3573 0.2772 0.3622 0.3081 0.3329 0.2674 0.2913 0.1247
3 0.3322 0.4105 0.3580 0.4219 0.3976 0.4179 0.3950 0.3974 0.1805
4 0.3322 0.4105 0.3580 0.4219 0.3976 0.4179 0.3950 0.3974 0.1805
5 0.3385 0.4095 0.3596 0.4139 0.3971 0.4018 0.3905 0.3776 0.1888
6 0.3464 0.4157 0.3699 0.4169 0.4054 0.3965 0.3873 0.3789 0.2007
7 0.3446 0.4203 0.3776 0.4336 0.4183 0.4082 0.4004 0.3973 0.2113
8 0.3545 0.4325 0.4016 0.4187 0.4292 0.4035 0.4060 0.4149 0.2361

Simple plotting

Because the file also contained information about the temperature at the time of the reading, platemate will also sotre it for any possible analysis.


In [29]:
# retrieving temperature
Temperature = pm.getTemperature()

# printing mean, min and max.
print "Average temperature: %4.1f'C" % ( Temperature.mean() )
print "Temperature range: %4.1f'C - %4.1f'C" % (Temperature.min(), Temperature.max())

pm.plotTemperature()
pl.show()


Average temperature: 24.1'C
Temperature range: 23.0'C - 25.8'C

In [26]:
print "Plotting all wells for each population"
pl.figure(figsize=(6,4.5))
pm.plotIt(["Dev1","Dev2","Dev3"])
pl.show()

print "Plotting averages for each population"
pl.figure(figsize=(6,4.5))
pm.plotMean(["Dev1","Dev2","Dev3"])
pl.show()

print "Plotting averages and 1-std intervals for each population"
pl.figure(figsize=(6,4.5))
pm.plotFuzzyMean(["Dev1","Dev2","Dev3"])
pl.show()


Plotting all wells for each population
Plotting averages for each population
Plotting averages and 1-std intervals for each population

In [27]:
pm.compareFluorescence("LB","Dev1")


Out[27]:
(0.0, 2.2027362480700291e-28)

In [28]:
pl.figure(figsize=(7,4))
pm.plotBars(["Dev1","Dev2","Dev3","-control1"], 5)
pl.show()


Using statistical tests, ANOVA and post-hoc Tukey's HSD test


An important part of comparing expressions of different devices/genes is the use of appropriate statistical testing. For instance, suppose that you want to find out if the response presented by Dev1 is significantly stronger than the response presented by the negative control 1. This can be solved by a simple statistical test such as testing Mann-Whitney's $U$ statistics. Below, we're showing the value evaluated for $U$ and p-value when comparing Dev1, Dev2 and Dev3 with -control1.


In [380]:
print "Device 1 vs -control1:"
print pm.compareFluorescence("Dev1","-control1")
print "Device 2 vs -control1:"
print pm.compareFluorescence("Dev2","-control1")
print "Device 3 vs -control1:"
print pm.compareFluorescence("Dev3","-control1")


Device 1 vs -control1:
(6561.0, 4.4054724961400581e-28)
Device 2 vs -control1:
(6532.0, 1.2860869327145435e-27)
Device 3 vs -control1:
(3366.0, 0.77585226092928528)

The result above shows that we cannot rule out the hypothesis that Dev3 is not significantly larger than the negative control. This completely agrees with the bar plots comparing all devices and the negative control. In other words, this basically shows that no significant expression was observed in our device 3.

Is the expression of device 3 at least stronger than LB?


In [359]:
print "Device 3 vs LB:"
print pm.compareFluorescence("Dev3","LB")
print "-control vs LB:"
print pm.compareFluorescence("-control1","LB")


Device 3 vs LB:
(4800.0, 3.6135744993300728e-07)
-control vs LB:
(5225.0, 7.420359894394196e-11)

This shows that it is, although the negative control also shows a response significantly larger than the LB medium.


Using all possible combinations is not the proper way to study multiple populations. Kruskal-Wallis' and (Post-Hoc) Dunn's tests are an extension of the $U$ statistics for comparing multiple populations at once.


Because ANOVA is a very popular test, platemate can perform ANOVA followed by a post-hoc test using Tukey's HSD.


In [360]:
pm.ANOVA(["Dev1","Dev2","Dev3","-control1"])


Out[360]:
(278.44071959391323, 7.5356734236160799e-89)

In [361]:
pm.TukeyHSD(["Dev1","Dev2","Dev3","-control1"])


Out[361]:
Group 1 Group 2 Mean diff Reject H0?
0 Dev1 Dev2 -1245.353679 True
1 Dev1 Dev3 -2184.682000 True
2 Dev1 -control1 -2179.667951 True
3 Dev2 Dev3 -939.328321 True
4 Dev2 -control1 -934.314272 True
5 Dev3 -control1 5.014049 False

This analysis shows that this data passes an ANOVA test (ruling out the null hypothesis that all populations have the same mean), and Tukey's test shos that except for Dev3 and -control1 is the only pair that does not show sgnificant change in their mean.

It is important to keep in mind, however, that ANOVA assumes that ...

  1. the samples are independent;
  2. each sample is from a normally distributed population;
  3. the population standard deviations of the groups are all equal.

If hypothesis 2 does not hold, then Kruskal-Wallis should be used instead.

Standardized promoter strength


In [ ]: